EpiCompare compares epigenetic datasets for quality control and benchmarking purposes. The report consists of three sections:

  1. General Metrics: Metrics on peaks (percentage of blacklisted and non-standard peaks, and peak widths) and fragments (duplication rate) of samples.
  2. Peak Overlap: Percetnage and statistical significance of overlapping and non-overlapping peaks. Also includes upset plot.
  3. Functional Annotation: Functional annotation (ChromHMM, ChIPseeker and enrichment analysis) of peaks. Also includes peak enrichment around TSS.

Input Datasets
  • Reference peakfile: H3K4me3_EN4G38_E4_IRpseudo2
  • Total of 15 peak files:
## [1] "File1: H3K4me3_EN4G38_E1_IRpseudo12"
## [1] "File2: H3K4me3_EN4G38_E1_IRpseudo1"
## [1] "File3: H3K4me3_EN4G38_E1_IRpseudo2"
## [1] "File4: H3K4me3_EN4G38_E1_IR12"
## [1] "File5: H3K4me3_EN3G38_E1_IR12"
## [1] "File6: H3K4me3_EN4G38_E2_IRpseudo12"
## [1] "File7: H3K4me3_EN4G38_E2_IRpseudo1"
## [1] "File8: H3K4me3_EN4G38_E2_IRpseudo2"
## [1] "File9: H3K4me3_EN4G38_E2_IR12"
## [1] "File10: H3K4me3_EN4G38_E3_IR12"
## [1] "File11: H3K4me3_EN4G38_E3_IRpseudo1"
## [1] "File12: H3K4me3_EN4G38_E3_IRpseudo2"
## [1] "File13: H3K4me3_EN4G38_E4_IRpseudo12"
## [1] "File14: H3K4me3_EN4G38_E4_IRpseudo1"
## [1] "File15: H3K4me3_EN4G38_E4_IRpseudo2"
Code
EpiCompare(peakfiles = list(H3K4me3_EN4G38_E1_IRpseudo12, H3K4me3_EN4G38_E1_IRpseudo1, H3K4me3_EN4G38_E1_IRpseudo2, H3K4me3_EN4G38_E1_IR12, H3K4me3_EN3G38_E1_IR12, H3K4me3_EN4G38_E2_IRpseudo12, H3K4me3_EN4G38_E2_IRpseudo1, H3K4me3_EN4G38_E2_IRpseudo2, H3K4me3_EN4G38_E2_IR12, H3K4me3_EN4G38_E3_IR12, H3K4me3_EN4G38_E3_IRpseudo1, H3K4me3_EN4G38_E3_IRpseudo2, H3K4me3_EN4G38_E4_IRpseudo12, H3K4me3_EN4G38_E4_IRpseudo1, H3K4me3_EN4G38_E4_IRpseudo2),
           genome_build = hg38,
           blacklist = blacklist,
           picard_files = list(),
           reference = H3K4me3_EN4G38_E4_IRpseudo2,
           stat_plot = TRUE,
           chromHMM_plot = TRUE,
           chromHMM_annotation = "K562",
           chipseeker_plot = TRUE,
           enrichment_plot = TRUE,
           interact = TRUE,
           save_output = TRUE,
           output_dir = "/Users/apple/Downloads/cutandrun/ENCODE_correlation/K562_H3K4me3")

1. General Metrics

Peak Information

Column Description:

  • PeakN before tidy: Total number of peaks including those blacklisted and those in non-standard chromosomes.
  • Blacklisted peaks removed (%): Percentage of blacklisted peaks in samples. E.g. ENCODE hg19 blacklist includes regions in the hg19 genome that have anomalous and/or unstructured signals independent of the cell-line or experiment.
  • Non-standard peaks removed (%): Percentage of peaks in non-standard and/or mitochondrial chromosomes. Identified using BRGenomics::tidyChromosomes().
  • PeakN after tidy: Total number of peaks after removing those in blacklisted regions and non-standard chromosomes.

NB: EpiCompare uses filtered peakfiles (i.e. datasets after removing peaks in blacklisted regions and non-standard chromosomes)


Sample PeakN before tidy Blacklisted peaks removed (%) Non-standard peaks removed (%) PeakN after tidy
H3K4me3_EN4G38_E1_IRpseudo12 21146 0.0000 0.0000 21146
H3K4me3_EN4G38_E1_IRpseudo1 19324 0.0000 0.0000 19324
H3K4me3_EN4G38_E1_IRpseudo2 20493 0.0000 0.0000 20493
H3K4me3_EN4G38_E1_IR12 19707 0.0000 0.0000 19707
H3K4me3_EN3G38_E1_IR12 21402 0.0514 0.0514 21380
H3K4me3_EN4G38_E2_IRpseudo12 23132 0.0000 0.0000 23132
H3K4me3_EN4G38_E2_IRpseudo1 22558 0.0000 0.0000 22558
H3K4me3_EN4G38_E2_IRpseudo2 22315 0.0000 0.0000 22315
H3K4me3_EN4G38_E2_IR12 21743 0.0000 0.0000 21743
H3K4me3_EN4G38_E3_IR12 29697 0.0000 0.0000 29697
H3K4me3_EN4G38_E3_IRpseudo1 26395 0.0000 0.0000 26395
H3K4me3_EN4G38_E3_IRpseudo2 25916 0.0000 0.0000 25916
H3K4me3_EN4G38_E4_IRpseudo12 43357 0.0000 0.0000 43357
H3K4me3_EN4G38_E4_IRpseudo1 40268 0.0000 0.0000 40268

Fragment Information

Metrics on fragments is shown only if Picard summary is provided. See manual for help.

Column Description:

  • Mapped_Fragments: Number of mapped read pairs in the file.
  • Duplication_Rate: Percentage of mapped sequence that is marked as duplicate.
  • Unique_Fragments: Number of mapped sequence that is not marked as duplicate.



Peak widths

Distribution of peak widths in samples



2. Peak Overlap

Percentage Overlap

Percentage of overlapping peaks between samples. Hover over heatmap for percentage values.

N.B. How to interpret heatmap: [Samples in x-axis of heatmap] peaks in [Samples in y-axis of heatmap] peaks



Upset Plot

Upset plot of overlapping peaks between samples. See here on how to interpret the plot.

quartz_off_screen 2


Statistical Significance

Depending on the format of the reference file, EpiCompare outputs different plots:

  • Reference dataset has BED6+4 format (peakcalling performed with MACS2): EpiCompare generates paired boxplot per sample showing the distribution of -log10(q-value) of reference peaks that are overlapping and non-overlapping with the sample dataset.
  • Reference dataset does not have BED6+4 format: EpiCompare generates a barplot of percentage of overlapping sample peaks with the reference, coloured by statistical significance (adjusted p-value) of the overlap.

Reference peakfile: H3K4me3_EN4G38_E4_IRpseudo2

Keys:

  • Overlap: Sample peaks in Reference peaks
  • Unique: Sample peaks not in Reference peaks



3. Functional Annotation

3.1 ChromHMM

ChromHMM annotates and characterises peaks into different chromatin states. ChromHMM annotations used in EpiCompare were obtained from here.

  • Cell-type annotation file used in this analysis: K562

All samples

ChromHMM annotation of individual samples.

Overlap: Sample peaks in Reference peaks

Percentage of Sample peaks found in Reference peaks (Reference peakfile: H3K4me3_EN4G38_E4_IRpseudo2)

Percentage
H3K4me3_EN4G38_E1_IRpseudo12 90.7
H3K4me3_EN4G38_E1_IRpseudo1 92.7
H3K4me3_EN4G38_E1_IRpseudo2 93.0
H3K4me3_EN4G38_E1_IR12 91.9
H3K4me3_EN3G38_E1_IR12 92.1
H3K4me3_EN4G38_E2_IRpseudo12 88.9
H3K4me3_EN4G38_E2_IRpseudo1 89.6
H3K4me3_EN4G38_E2_IRpseudo2 90.1
H3K4me3_EN4G38_E2_IR12 90.1
H3K4me3_EN4G38_E3_IR12 83.9
H3K4me3_EN4G38_E3_IRpseudo1 86.1
H3K4me3_EN4G38_E3_IRpseudo2 86.5
H3K4me3_EN4G38_E4_IRpseudo12 79.8
H3K4me3_EN4G38_E4_IRpseudo1 77.6
H3K4me3_EN4G38_E4_IRpseudo2 100.0

ChromHMM annotation of sample peaks found in reference peaks.

Overlap: Reference peaks in Sample peaks

Percentage of Reference peaks found in Sample peaks (Reference peakfile: H3K4me3_EN4G38_E4_IRpseudo2)

Percentage
H3K4me3_EN4G38_E1_IRpseudo12 61.9
H3K4me3_EN4G38_E1_IRpseudo1 53.6
H3K4me3_EN4G38_E1_IRpseudo2 60.9
H3K4me3_EN4G38_E1_IR12 59.8
H3K4me3_EN3G38_E1_IR12 57.7
H3K4me3_EN4G38_E2_IRpseudo12 57.5
H3K4me3_EN4G38_E2_IRpseudo1 51.7
H3K4me3_EN4G38_E2_IRpseudo2 55.8
H3K4me3_EN4G38_E2_IR12 56.3
H3K4me3_EN4G38_E3_IR12 62.0
H3K4me3_EN4G38_E3_IRpseudo1 59.6
H3K4me3_EN4G38_E3_IRpseudo2 58.9
H3K4me3_EN4G38_E4_IRpseudo12 92.1
H3K4me3_EN4G38_E4_IRpseudo1 73.0
H3K4me3_EN4G38_E4_IRpseudo2 100.0

ChromHMM annotation of reference peaks found in sample peaks.

Unique: Sample peaks not in Reference peaks

Percentage of sample peaks not found in reference peaks (Reference peakfile: H3K4me3_EN4G38_E4_IRpseudo2)

Percentage
H3K4me3_EN4G38_E1_IRpseudo12 9.33
H3K4me3_EN4G38_E1_IRpseudo1 7.28
H3K4me3_EN4G38_E1_IRpseudo2 7.04
H3K4me3_EN4G38_E1_IR12 8.10
H3K4me3_EN3G38_E1_IR12 7.90
H3K4me3_EN4G38_E2_IRpseudo12 11.10
H3K4me3_EN4G38_E2_IRpseudo1 10.40
H3K4me3_EN4G38_E2_IRpseudo2 9.94
H3K4me3_EN4G38_E2_IR12 9.89
H3K4me3_EN4G38_E3_IR12 16.10
H3K4me3_EN4G38_E3_IRpseudo1 13.90
H3K4me3_EN4G38_E3_IRpseudo2 13.50
H3K4me3_EN4G38_E4_IRpseudo12 20.20
H3K4me3_EN4G38_E4_IRpseudo1 22.40
H3K4me3_EN4G38_E4_IRpseudo2 0.00

ChromHMM annotation of sample peaks not found in reference peaks.

Unique: Reference peaks not in Sample peaks

Percentage of reference peaks not found in sample peaks (Reference peakfile: H3K4me3_EN4G38_E4_IRpseudo2)

Percentage
H3K4me3_EN4G38_E1_IRpseudo12 38.1
H3K4me3_EN4G38_E1_IRpseudo1 46.4
H3K4me3_EN4G38_E1_IRpseudo2 39.1
H3K4me3_EN4G38_E1_IR12 40.2
H3K4me3_EN3G38_E1_IR12 42.3
H3K4me3_EN4G38_E2_IRpseudo12 42.5
H3K4me3_EN4G38_E2_IRpseudo1 48.3
H3K4me3_EN4G38_E2_IRpseudo2 44.2
H3K4me3_EN4G38_E2_IR12 43.7
H3K4me3_EN4G38_E3_IR12 38.0
H3K4me3_EN4G38_E3_IRpseudo1 40.4
H3K4me3_EN4G38_E3_IRpseudo2 41.1
H3K4me3_EN4G38_E4_IRpseudo12 7.9
H3K4me3_EN4G38_E4_IRpseudo1 27.0
H3K4me3_EN4G38_E4_IRpseudo2 0.0

ChromHMM annotation of reference peaks not found in sample peaks.


3.2 ChIPseeker

EpiCompare uses ChIPseeker::annotatePeak() to annotate peaks with the nearest gene and genomic regions where the peak is located. The peaks are annotated with genes taken from human genome annotations (hg19 or hg38) provided by Bioconductor.


3.3 Functional Enrichment Analysis

EpiCompare performs KEGG pathway and GO enrichment analysis using clusterProfiler. ChIPseeker::annotatePeak() is first used to assign peaks to nearest genes. Biological themes amongst the genes are identified using ontologies (KEGG and GO). The peaks are annotated with genes taken from annotations of human genome (hg19 or hg38) provided by Bioconductor.


KEGG


GO


3.4 Peak Frequency around TSS

This plots peaks that are mapping to transcriptional start sites (TSS). TSS regions are defined as the flanking sequence of the TSS sites. The frequency of peaks in downstream (-3000bp) and upstream (+3000bp) of TSS is plotted. Faint color line around the main frequency line represents the 95% confidence interval estimated by bootstrap method.

## >> Running bootstrapping for tag matrix...        2022-06-07 20时29分52秒 
## >> Running bootstrapping for tag matrix...        2022-06-07 20时31分31秒 
## >> Running bootstrapping for tag matrix...        2022-06-07 20时33分14秒 
## >> Running bootstrapping for tag matrix...        2022-06-07 20时35分01秒 
## >> Running bootstrapping for tag matrix...        2022-06-07 20时36分34秒 
## >> Running bootstrapping for tag matrix...        2022-06-07 20时38分19秒 
## >> Running bootstrapping for tag matrix...        2022-06-07 20时39分54秒 
## >> Running bootstrapping for tag matrix...        2022-06-07 20时41分45秒 
## >> Running bootstrapping for tag matrix...        2022-06-07 20时43分31秒 
## >> Running bootstrapping for tag matrix...        2022-06-07 20时45分34秒 
## >> Running bootstrapping for tag matrix...        2022-06-07 20时47分32秒 
## >> Running bootstrapping for tag matrix...        2022-06-07 20时49分58秒 
## >> Running bootstrapping for tag matrix...        2022-06-07 20时52分50秒 
## >> Running bootstrapping for tag matrix...        2022-06-07 20时55分26秒 
## >> Running bootstrapping for tag matrix...        2022-06-07 20时58分50秒
## [[1]]
## [[1]][[1]]
## [[1]][[1]][[1]]
## [[1]][[1]][[1]][[1]]
## [[1]][[1]][[1]][[1]][[1]]
## [[1]][[1]][[1]][[1]][[1]][[1]]
## [[1]][[1]][[1]][[1]][[1]][[1]][[1]]
## [[1]][[1]][[1]][[1]][[1]][[1]][[1]][[1]]
## [[1]][[1]][[1]][[1]][[1]][[1]][[1]][[1]][[1]]
## [[1]][[1]][[1]][[1]][[1]][[1]][[1]][[1]][[1]][[1]]
## [[1]][[1]][[1]][[1]][[1]][[1]][[1]][[1]][[1]][[1]][[1]]
## [[1]][[1]][[1]][[1]][[1]][[1]][[1]][[1]][[1]][[1]][[1]][[1]]
## [[1]][[1]][[1]][[1]][[1]][[1]][[1]][[1]][[1]][[1]][[1]][[1]][[1]]
## [[1]][[1]][[1]][[1]][[1]][[1]][[1]][[1]][[1]][[1]][[1]][[1]][[1]][[1]]
## list()
## 
## [[1]][[1]][[1]][[1]][[1]][[1]][[1]][[1]][[1]][[1]][[1]][[1]][[1]][[2]]

## 
## 
## [[1]][[1]][[1]][[1]][[1]][[1]][[1]][[1]][[1]][[1]][[1]][[1]][[2]]

## 
## 
## [[1]][[1]][[1]][[1]][[1]][[1]][[1]][[1]][[1]][[1]][[1]][[2]]

## 
## 
## [[1]][[1]][[1]][[1]][[1]][[1]][[1]][[1]][[1]][[1]][[2]]

## 
## 
## [[1]][[1]][[1]][[1]][[1]][[1]][[1]][[1]][[1]][[2]]

## 
## 
## [[1]][[1]][[1]][[1]][[1]][[1]][[1]][[1]][[2]]

## 
## 
## [[1]][[1]][[1]][[1]][[1]][[1]][[1]][[2]]

## 
## 
## [[1]][[1]][[1]][[1]][[1]][[1]][[2]]

## 
## 
## [[1]][[1]][[1]][[1]][[1]][[2]]

## 
## 
## [[1]][[1]][[1]][[1]][[2]]

## 
## 
## [[1]][[1]][[1]][[2]]

## 
## 
## [[1]][[1]][[2]]

## 
## 
## [[1]][[2]]

## 
## 
## [[2]]

4. Session Info

## R version 4.2.0 (2022-04-22)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Monterey 12.3.1
## 
## Matrix products: default
## LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] zh_CN.UTF-8/zh_CN.UTF-8/zh_CN.UTF-8/C/zh_CN.UTF-8/zh_CN.UTF-8
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] rtracklayer_1.56.0   GenomicRanges_1.48.0 GenomeInfoDb_1.32.2  IRanges_2.30.0       S4Vectors_0.34.0     BiocGenerics_0.42.0 
## [7] EpiCompare_1.0.0    
## 
## loaded via a namespace (and not attached):
##   [1] utf8_1.2.2                               tidyselect_1.1.2                         htmlwidgets_1.5.4                       
##   [4] RSQLite_2.2.14                           AnnotationDbi_1.58.0                     grid_4.2.0                              
##   [7] BiocParallel_1.30.3                      scatterpie_0.1.7                         munsell_0.5.0                           
##  [10] codetools_0.2-18                         colorspace_2.0-3                         GOSemSim_2.22.0                         
##  [13] TxDb.Hsapiens.UCSC.hg38.knownGene_3.15.0 Biobase_2.56.0                           filelock_1.0.2                          
##  [16] highr_0.9                                knitr_1.39                               rstudioapi_0.13                         
##  [19] DOSE_3.22.0                              labeling_0.4.2                           MatrixGenerics_1.8.0                    
##  [22] GenomeInfoDbData_1.2.8                   polyclip_1.10-0                          seqPattern_1.28.0                       
##  [25] bit64_4.0.5                              farver_2.1.0                             downloader_0.4                          
##  [28] vctrs_0.4.1                              treeio_1.20.0                            generics_0.1.2                          
##  [31] xfun_0.31                                BiocFileCache_2.4.0                      R6_2.5.1                                
##  [34] graphlayouts_0.8.0                       locfit_1.5-9.5                           bitops_1.0-7                            
##  [37] BRGenomics_1.8.0                         cachem_1.0.6                             fgsea_1.22.0                            
##  [40] gridGraphics_0.5-1                       DelayedArray_0.22.0                      assertthat_0.2.1                        
##  [43] vroom_1.5.7                              promises_1.2.0.1                         BiocIO_1.6.0                            
##  [46] scales_1.2.0                             ggraph_2.0.5                             enrichplot_1.16.1                       
##  [49] gtable_0.3.0                             tidygraph_1.2.1                          rlang_1.0.2                             
##  [52] genefilter_1.78.0                        splines_4.2.0                            lazyeval_0.2.2                          
##  [55] impute_1.70.0                            plyranges_1.16.0                         BiocManager_1.30.18                     
##  [58] yaml_2.3.5                               reshape2_1.4.4                           crosstalk_1.2.0                         
##  [61] GenomicFeatures_1.48.3                   httpuv_1.6.5                             qvalue_2.28.0                           
##  [64] clusterProfiler_4.4.2                    tools_4.2.0                              ggplotify_0.1.0                         
##  [67] gridBase_0.4-7                           ggplot2_3.3.6                            ellipsis_0.3.2                          
##  [70] gplots_3.1.3                             jquerylib_0.1.4                          RColorBrewer_1.1-3                      
##  [73] Rcpp_1.0.8.3                             plyr_1.8.7                               progress_1.2.2                          
##  [76] zlibbioc_1.42.0                          purrr_0.3.4                              RCurl_1.98-1.6                          
##  [79] prettyunits_1.1.1                        viridis_0.6.2                            SummarizedExperiment_1.26.1             
##  [82] ggrepel_0.9.1                            magrittr_2.0.3                           data.table_1.14.2                       
##  [85] DO.db_2.9                                matrixStats_0.62.0                       hms_1.1.1                               
##  [88] patchwork_1.1.1                          mime_0.12                                evaluate_0.15                           
##  [91] xtable_1.8-4                             XML_3.99-0.9                             gridExtra_2.3                           
##  [94] compiler_4.2.0                           biomaRt_2.52.0                           tibble_3.1.7                            
##  [97] KernSmooth_2.23-20                       crayon_1.5.1                             shadowtext_0.1.2                        
## [100] htmltools_0.5.2                          ggfun_0.0.6                              later_1.3.0                             
## [103] tzdb_0.3.0                               tidyr_1.2.0                              geneplotter_1.74.0                      
## [106] aplot_0.1.6                              DBI_1.1.2                                tweenr_1.0.2                            
## [109] ChIPseeker_1.32.0                        genomation_1.28.0                        dbplyr_2.2.0                            
## [112] MASS_7.3-57                              rappdirs_0.3.3                           boot_1.3-28                             
## [115] Matrix_1.4-1                             readr_2.1.2                              cli_3.3.0                               
## [118] parallel_4.2.0                           igraph_1.3.1                             pkgconfig_2.0.3                         
## [121] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2  GenomicAlignments_1.32.0                 plotly_4.10.0                           
## [124] xml2_1.3.3                               ggtree_3.4.0                             annotate_1.74.0                         
## [127] bslib_0.3.1                              XVector_0.36.0                           yulab.utils_0.0.4                       
## [130] stringr_1.4.0                            digest_0.6.29                            Biostrings_2.64.0                       
## [133] rmarkdown_2.14                           fastmatch_1.1-3                          tidytree_0.3.9                          
## [136] restfulr_0.0.14                          curl_4.3.2                               shiny_1.7.1                             
## [139] Rsamtools_2.12.0                         gtools_3.9.2.1                           rjson_0.2.21                            
## [142] lifecycle_1.0.1                          nlme_3.1-157                             jsonlite_1.8.0                          
## [145] viridisLite_0.4.0                        BSgenome_1.64.0                          fansi_1.0.3                             
## [148] pillar_1.7.0                             lattice_0.20-45                          KEGGREST_1.36.0                         
## [151] fastmap_1.1.0                            httr_1.4.3                               plotrix_3.8-2                           
## [154] survival_3.3-1                           GO.db_3.15.0                             interactiveDisplayBase_1.34.0           
## [157] glue_1.6.2                               UpSetR_1.4.0                             png_0.1-7                               
## [160] BiocVersion_3.15.2                       bit_4.0.4                                sass_0.4.1                              
## [163] ggforce_0.3.3                            stringi_1.7.6                            blob_1.2.3                              
## [166] DESeq2_1.36.0                            org.Hs.eg.db_3.15.0                      AnnotationHub_3.4.0                     
## [169] caTools_1.18.2                           memoise_2.0.1                            dplyr_1.0.9                             
## [172] ape_5.6-2